Thermodynamic efficiency of information and heat flow 
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A basic task of information processing is information transfer (flow). Here we study a pair 
of Brownian particles each coupled to a thermal bath at temperature Ti and T2, respectively. 
The information flow in such a system is defined via the time-shifted mutual information. The 
information flow nullifies at equilibrium, and its efficiency is defined as the ratio of flow over the total 
entropy production in the system. For a stationary state the information flows from higher to lower 
temperatures, and its the efficiency is bound from above by m y£j\J^^ . This upper bound is imposed 
by the second law and it quantifies the thermodynamic cost for information flow in the present class 
of systems. It can be reached in the adiabatic situation, where the particles have widely different 
characteristic times. The efficiency of heat flow — defined as the heat flow over the total amount 
of dissipated heat — is limited from above by the same factor. There is a complementarity between 
heat- and information-flow: the setup which is most efficient for the former is the least efficient 
for the latter and vice versa. The above bound for the efficiency can be [transiently] overcome in 
certain non- stationary situations, but the efficiency is still limited from above. We study yet another 
measure of information-processing [transfer entropy] proposed in literature. Though this measure 
does not require any thermodynamic cost, the information flow and transfer entropy are shown to 
be intimately related for stationary states. 
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I. INTRODUCTION 

Relations between statistical thermodynamics and in- 
formation science have long since been recognized; they 
have been the source of mutual fertilization, but occasion- 
ally also of confusion. Pertinent examples are the nature 
of the maximum entropy principle [1] and the analysis of 
the Maxwell demon concept Q. Then there is the huge 
field of infomation processing devices. Their continuing 
miniaturization d is approaching already the nanometer 
scale, which makes it obligatory to study the information 
carriers as physical entities subject to the laws of statisti- 
cal physics. Note that information processing constitutes 
a certain class of tasks (functionality) to be implemented 
on a physical system. Basic tasks of such kind are, inter 
alia, information erasure and information transfer. 

The thermodynamic cost of information erasure re- 
ceived much attention 0, i, @, 0, @, & El EH; see 
for a review. The information erasure is governed by the 
Landauer principle Bl II ( 
its limitations [E ©, OB III \ 
different perspectives. 

The task of information transfer is not uniquely de- 
fined, but has to be specified before one can start any 
detailed physical investigation (see below). Its thermo- 
dynamical cost in the pre sence of a thermal bath has 
been studied in 0, El El El El d El, El ■ There is 
an essential difference between the problem of thermo- 
dynamic costs during information erasure versus those 
during information transfer. In the former case the in- 
formation carrier has to be an open, or even dissipative 
system, while for the latter case the external bath fre- 
quently plays a role of a hindrance [H El El- For 



1 Q, which — together with 
has been investigated from 



a finite, conservative system the problem of thermody- 
namic costs is not well-defined, since for such systems 
the proper measures of irreversibility and dissipation are 
lacking; see, however, (24[ in this context. Thus, the ther- 
modynamic cost for information transfer is to be studied 
in specific macroscopic settings. 

The current status of the problem is controversial: in 
the literature there are statements claiming both the exis- 
tence [HElEl and the absence 0, El, El El El of in- 
evitable thermodynamic costs during information trans- 
fer. The arguments against the fundamental bounds on 
thermodynamic costs in information transfer [l2l . El EH 
El EBj rely in essence on the known statistical physics 
fact that the entropy generated during a process can be 
nullified by nullifying the rate of this process [11. How- 
ever, this and related arguments leave unanswered the 
question on whether there are thermodynamic costs in a 
more realistic case of finite-rate information transfer. In 
practice, the information transfer should normally pro- 
ceed at a finite rate. 

The presently known arguments in favour of the exis- 
tence of thermodynamic cost during information transfer 
are either heuristic [l?], El > or concentrate on those as- 
pects of computation and communication, which require 
some energy for carrying out the task, but this energy 
is not necessarily dissipated; see 0, [2l|, [H HI 1 - The 
present work aims at clarifying how processes of informa- 
tion and heat flow relate to dissipative charateristics such 



1 The authors of |2.'l point out that although in principle the en- 
ergy required for carrying out these tasks need not be dissipated, 
in practice it is normally dissipated, at least partially. 
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as entropy production or heat dissipation. In particular, 
we clarify which taks of information processing do (not) 
reqiure thermodynamic cost. 

We shall work with the — supposedly — simplest set-up 
that allows to study the above problem: two (classical) 
Brownian particles each interacting with a thermal bath. 
The model combines two basic ingredients needed for 
studying the information transfer: randomness — which 
is necessary for the very notion of information — and di- 
rectedness, i.e., the possibility of inducing a current of 
physical quantities via externally imposed gradients of 
temperature and/or various potentials. 

For the present bi-partite problem, whose dynamics 
is formulated as a Markov process for random coordi- 
nates X\(t) and X 2 (t) of Brownian particle, the mutual 
information is given by the known Shannon expression 
I[Xi(t) : X 2 (t)] [IE, [iH, H3] 2 ■ This is an ensemble prop- 
erty, which is naturally symmetric I[Xi(t) : X 2 (t)] = 
I[X2(t) : Xi(t)}, and has the same status as other macro- 
scopic observables in statistical physics, e.g., the average 
energy. The information flow 12— >i is defined via the time- 
shifted mutual information via 12— >i = d T I[X\(t + r) : 
X 2 {t)]\ r ^+o- The full rate ^I[Xx{t) : X 2 (t)] of mutual 
information is now separated into the information that 
has flown from the first to the second particle ii_ ^ and 
vice versa: ^I[Xx(t) : X 2 (t)] = ii_>2 + i2->i- We discuss 
this definition in section IIV1 explain its relation to pre- 
diction and clarify the meaning of a negative information 
flow. The usage of the time-shifted mutual information 
for quanti fying the information flow was advocated in 

The information flow is an asymmetric quantity, 
12— >i 7^ li— >2i and it allows to distinguish between the 
source of information versus its recipient (such a dis- 
tinction cannot be done via the mutual information, 
which is a symmetric quantity). Another important 
feature of information flow is that — for the considered 
bi-partite Markovian system — it nullifies in equilibrium: 
12— >i = li— >2 = 0, because the information flow appears 
to be related to the entropy flow; see section [V] 

Once we have shown that the information flow is ab- 
sent in equilibrium, we concentrate on the case when this 
flow is induced by a temperature gradient. Despite some 
formal similarities, a temperature gradient and a poten- 
tial gradient — two main sources of non-equiilbrium — are 
of different physical origin and enjoy different features: 
see 44] for a recent discussion. Hence we expect that the 
specific features of the information flow will differ de- 
pending on the type of non-equilibrium situation. In this 
context, the temperature-gradient situation is perhaps 
the first one to study, since the function of the informa- 



2 We stress that the employed notion of information concerns its 
syntactic aspects and does not concern its semantic [meaning] 
and pragmatic [purpose] aspects. Indeed, the problems we intend 
to study — physics of information carriers, thermodynamic cost of 
information flow, etc — refer primarily to syntactic aspects. 



tion transfer comes close to the function of a thermody- 
namic machine. This far reaching analogy reflects itself 
in the features of the efficiency of information transfer, 
which is defined as the useful product, i.e., the informa- 
tion flow, divided over the total waste, as quantified by 
the entropy production in the overall system. In the sta- 
tionary two-temperature state, where both the informa- 
tion flow and entropy production are time-independent, 
the efficiency of information flow is limited from above 
by ^F^~fffl ■ This expression depends only on tempera- 
tures and does not depend on various details of the sys- 
tem (such as damping constants, inter-particle potentials, 
etc) . The existence of such an upper bound implies a def- 
inite thermodynamic cost for information flow. Interest- 
ingly, the upper bound for the efficiency can be reached 
in the adiabatic situation, when the source of informa- 
tion is much slower than its recipient. This fact to some 
extent resembles the reachability of the Carnot bound for 
thermodynamic efficiency of heat-engines and refrigera- 
tors The upper bound m ^j- 7 } r ^'^ for the efficiency 
of information flow can be well surpassed in certain non- 
stationary, transient situations. Even then the efficiency 
of information flow is limited from above via the physical 
parameters of the system. 

We shall argue that there is a clear parallel in the def- 
inition of information flow and heat flow. This is ad- 
ditionally underlined by the fact that in the stationary 
state both heat and information flow from higher tem- 
peratures to lower ones. Moreover, the efficiency of heat 
flow — defined accordingly as the ration of the heat flow 
and heat dissipation rate — appears to be limited from 
above by the same factor m ^;^7^ ■ There is however an 
important complementarity here: the upper bound for 
the efficiency of heat flow is is reached exactly for that 
setup which is the worst one for the efficiency of infor- 
mation flow; see section IVIIII 

The information flow 12— >i characterizes the pre- 
dictability of (future of) 1 (first particle) from the view- 
point of 2 (second particle). Another aspect of infor- 
mation processing in stochastic systems concerns pre- 
dicting by 1 its own future, and the help provided by 
2 in accomplishing this task. This is essentially the no- 
tion of Granger-causality first proposed in econometrics 
for quantifying causal relation between coupled stochas- 
tic processes [33l . [34| . and formalized in information- 
theoretic terms via the concept of transfer entropy 
[35l [36j . It appears that this type of information pro- 
cessing does not require any thermodynamic cost, i.e., the 
transfer entropy — in contrast to information flow — does 
not necessarily nullify at equilibrium. Despite of this, 
there are interesting relations between the transfer en- 
tropy and information flow, which are partially uncov- 
ered in section ITXl 

The paper is organized as follows. Section HT1 reminds 
the definition of entropy and mutual information; this 
reminder is continued in Appendix [A"l Section IHT1 defines 
the class of models to be studied. In section llVl we discuss 
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in detail the information-theoretic definition of informa- 
tion flow. This discussion is continued in section fVl with 
relating the information flow to the entropy flow. The 
same section recalls the concepts of entropy production 
and heat dissipation. The efficiency of information flow is 
defined and studied in section IVII Many of the obtained 
results will be illustrated via an exactly solvable example 
of coupled harmonic oscillators; see section IVIII Section 
IVIIII studies the efficiency of heat flow, while in section 
IIXI we compare the concept of information flow with the 
notion of transfer entropy. Our results are shortly sum- 
marized in section [X] Some technical questions are rele- 
gated to Appendices. 

II. BASIC CONCEPTS: ENTROPY AND 
MUTUAL INFORMATION 

The purpose of this section is to recall the definition 
of entropy and mutual information. 



Assume that we learned a realization xi of AT. This 
allows to redefine the probabilities of various realizations 
of Y: p(yk) — > p{yk\xi), where p(y k \xi) is the conditional 
probability. Due to this redefinition also the entropy Y 
changes: S[Y] -> S[Y\xi] = -Y,l =1 p(yk\xi) In p(y k \xi). 
Averaging iSp^a;/] over p(xi) we get 

En 
kl=1 P(yk,xi)lnp(y k \xi). 

This conditional entropy characterizes the average resu- 
dial entropy of Y: S[Y\X] < S[Y] 3 . If there is a bijective 
function /(.) such that Y = /(A) (X = / _1 (F)), then 
S[Y\X]=Q. We have S[Y\X] = S[Y] for independent 
random variables X and Y. 

The mutual information I[Y : X] between X and Y 
is that part of entropy of Y, which is due to the missing 
knowledge about X. To define I\Y : X] we subtract the 
residual entropy i9[y|X] from the unconditional entropy 
S[Y]: 



A. Entropy 

How can one quantify the information content of a ran- 
dom variable X with realizations x±, . . .x n and probabil- 
ities p(x\), . . .p{x n )l This is routinely done by means of 
the entropy 



En 
k=i p{x k )lnp(x k 



(1) 



S[X] is well known in physics. Its information-theoretic 
meaning, which is based on the law of large numbers, is 
reminded in Appendix [X] One can arrive at the same 
form ([1]) by imposing certain axioms, which are intu- 
itively expected from the notion of uncertainty or the 
information content (2f| [26|, [27| • 

For a continuous random variable X the entropy con- 
verges to 

S[X] = - J dx P(x) In P(x) + additive constant, (2) 

where P(x) is the probability density of A, and where the 
additive constant is normally irrelevant, since it cancels 
out when calculating any entropy differences. The quan- 
tity e - I <ix P(x)inP(x) cnarac terizes the [effective] volume 
of the support of P(x) [M US El- 



B. Mutual Information 

We shall consider the task to predict some present state 
Y(t) of one subsystem from the present state X(t) of 
another subsystem and vice versa. For this purpose we 
introduce two dependent random variables A and Y with 
realizations x\ , . . . x n and y\, . . .y n and joint probabilities 

{p(xk,yi)}ti=v 



I[Y:X] = S[Y]-S[Y\X] 

En 
kl=i P(x k ,yi)\n 



p{x k ,yi) 
px(x k )pY(yi 



(3) 
(4) 



where px andpy are the marginal probabilities px{x k ) = 
Td=iP(x k ,yi) and p Y (yi) = Y, k =iP(xk,yi). 

The mutual information is non-negative, I[Y : A] > 
0, symmetric, I[Y : X] — I[X : Y], and characterizes 
the entropic response of one variable to fluctuations of 
another. For two bijectively related random variables 
I[X : Y] = S[X], and we return to the entropy. For 
independent random quantities A and Y, I[X : Y] = 
0. Conversely, I[X : Y] = imlpies that A and Y are 
independent. Thus I[X : Y] is a non-linear correlation 
function between A and Y. 

The information-theoretic meaning of the mutual in- 
formation I[X : Y] is recalled in Appendix [A] I[X : Y] 
is related to the information shared via a noisy channel 
with input A and output Y, or, alternatively, with input 
Y and output A. 

For continuous random variable X and Y with the joint 
probability density P(x,y), respectively, the mutual in- 
formation reads 



I[X : Y] = J dxP(x,y) In 



P(x,y) 
Px{x)P Y {yY 



(5) 



where additive constants have cancelled after taking the 
difference in ©. 



3 Note that generally S|V|a:;] < S[Y] does not hold, i.e., it is not 
true that any single observation reduces the entropy. Such a 
reduction occurs only in average. 
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III. MODEL CLASS: TWO COUPLED 
BROWNIAN PARTICLES 

Consider two brownian particles with coordinates x = 
(xijXz) interacting with two independent thermal baths 
at temperatures Ti and T2, respectively, and subjected 
to a potential [Hamiltonian] H(x). The corresponding 
time-dependent random variables will be denoted via 
( Xi(t), X2{t) ); their realizations are (xi,x 2 )- 

The overdamped limit of the Brownian dynamics is 
defined by the following two conditions [46| : i) The char- 
acteristic relaxation time of the (real) momenta mii is 
much smaller than the one of the coordinates. This con- 
dition is satisfied due to strong friction and/or small mass 
[46|. ii) One is interested in times which are much larger 
than the relaxation time of the momenta, but which can 
be much smaller than [or comparable to] the relaxation 
time of the coordinates. Under these conditions the dy- 
namics of the system is described by Langevin equations 
El: 



= -d i H-T i x i + 'n i {t), 

{m{t)nAt')) = 2r< Ti^(*-0 i,j = 1,2, 



(6) 



where Ti are the damping constants, which characterize 
the coupling of the particles to the respective baths, &y 
is the Kronecker symbol, and where di = d/dxi. It is 
assumed that the relaxation time toward the total equi- 
librium (where Ti = Xi) is much larger than all consid- 
ered times; thus for our purposes T2 and T\ are constant 
parameters. 

Eq. ([6]) comes from the Newton equation (mass x ac- 
celeration = conservative force -I- friction force + ran- 
dom force) upon neglecting the mass x acceleration due 
to strong friction and/or small mass [46]. Among many 
other realizations, Eq. ([6]) may physically be realized via 
two coupled RLC circuits. Then Ti = Ri corresponds to 
the resistance of each circuit, Xi is the charge, while the 
noise rji refers to the random electromotive force. The 
overdamped regime would refer here to small inductance 
Li and/or large resistance Ri, while the Hamiltonian 
part H(xi,x 2 ) collects separate effects of capacitances 
C\ and C2, as well as capacitance-capacitance coupling, 

x 2 x 2 

e.g., H = + + kx\X2 in the harmonic regime. 
This example will be studied in section IVII1 
Below we use the following shorthands: 

dt = d/dt, di = d/dxi, ■x.= {x\,X2), dx = dx\dx2, 
y = (2/1,2/2), dy = dy x dy 2 . (7) 

The joint probability distribution P{x\,x 2 ;t) satisfies 
the Fokker-Planck equation [4(| 

a t P(x;/)+V 2 9 i Ji(x;t)=0, (8) 

J l—l 

TiJifat) = -P(x; t) dtH(xL) - T i 9 i P(x;t), (9) 

where J\, 3 2 are the currents of probability. Eq. {8{ is 
supplemented by the standard boundary conditions 

P(x±, X2; t) — » when x\ — > ±co or X2 — > ±00. (10) 



A. Chapman-Kolmogorov equation. 

To put our discussion in a more general context, let us 
recall that the process described by (0 E]) is Markovian 
and satisfies the Chapman-Kolmogorov equation [46| : 

P(x; t + r) = J dy P(x; t + r|y; i)P(y; t), (11) 

where for r — * the conditional probability density 
P(x; t + r|y; t) is written as [46j 

P(x; t + r|y; t) = S(xx - yi)6{x 2 - y 2 ) 
+t6{x 2 - y 2 )Gi(x 1 \y 1 ;x2) +tS(xi - yi)G 2 (x2\y2; Xi) 
+0(t 2 ), (12) 

where for i = 1 , 2 we have defined 

(13) 

■ih \M !h - x 1 )d 1 H{^)+T 1 d 1 5{y 1 -x 1 )}. 

(14) 

■ d 2 [% 2 - X2)d 2 H( x ) + T 2 d2S{y2 - x 2 )] . 



Gi(x 1 \yi;x 2 ) 
1 

= rT 

G 2 (x 2 \y2;xi) 
1 



Note from |[T2Hl4l) that 



P(xv,t + T\y;t)P(x 2 ;t + T\y;t) 

= P( Xl ,x 2 ;t + T\y;t)+0{T 2 ), (15) 

which means that the conditional dependence of X±(t+T) 
and X2 (t+r) , given X\ (t) and X2 (t) vanishes with second 
order in r. 

Eqs. ([8l[9]) are recovered after substituting (fl2l[T4| into 
dHJ and noting P(x; t+r) = P(x; t)+r9 t P(x; t)+0(r 2 ): 



d t P(x;t) = J dyiGi(xi\yi;x2)P(yi,x 2 ;t) 

+ dy 2 G 2 (x 2 \y2;xi)P{xi,y2;t)- (16) 



IV. INFORMATION-THEORETICAL 
DEFINITION OF INFORMATION FLOW 

A. Task 

We consider the task of predicting gain (or loss) of the 
future of subystem 1 from the present state of subsys- 
tem 2. This task can by quantified by the information 
flow 12— >i, which is defined via the time-shifted mutual 
information 



i2^i(t)=d T I[X 1 (t + T):X 2 (t)} | r ^ +0 



(17) 



= lim T ^+o - {I[Xx(t + t) : X 2 (t)\ - 7[X x (i) : X 2 {i)}) . 

T 

ii_,2(£) is defined analogously with interchanging 1 and 
2. 
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Recall that the mutual information I[Xi(t + r) : X 2 (t)] 
quantifies (non-linear) statistical dependencies between 
Xi(t + t) and X 2 (t), i.e., it quanties the extent of which 
the present (at time t) of X 2 can predict the future of 
X\. Thus a positive 12— *i (i) means that the future of X\ 
is more predictable for X 2 than the present of X\. Thus, 
for i2_»i(t) > 0, 2 is gaining control over 1 (or 2 sends 
information to 1) 4 . Likewise, i2_,i(f) < means that 2 
is loosing control over 1, or that 1 gains autonomy with 
respect to 2. This is the meaning of negative information 
flow. 

Noting from © that 

I[X-L(t + T) :X 2 (t)} = J dx 1 dy 2 P 2 (y 2 - 1 t) x 

Pi\2{xi;t + T\y 2 ;t) In — — — , 

Pi{xi;t + T) 

we work out i2_>i(i) with help of (p~2l ~ fT4|) : 

t) 



12-1 = [ dydx 1 Gi{x 1 \yi;y 2 )P{y;t)\n ^ 1 - 
J Pi{xi; 



Employing now (O [9]) and the boundary conditions JTH 
we get from (TTg|) 



(18) 





"Pi(a:i;*)' 


y dxln 


[ P(*;t) \ 



di Ji (x;i). 



(19) 



Parametrizing the Hamiltonian as 

ff(a 1 ,x 2 )=Fi(a!i)+F2(a;a) + fri2(xi,a;a) ) (20) 

where H\ 2 {x\, x 2 ) is the interaction Hamiltonian, we ob- 
tain from jaiSlQSJ 



i 2 -i (*) = ^rj dxP(x;t) [9 x ff 12 (x) 



+7^ In P(x; t) ] ft In Pl f 1] f . (21) 
P(x;t) 

The time-shifted mutual information was employed for 
quantifying information flow in reaction-diffusion sys- 
tems [29j | . neuronal ensembles 13011 . coupled map lattices 
[3l| . and ecological dynamics [32( . 



B. Basic features of information flow 

1. As deduced from (I19p . the information flow is gen- 
erally not symmetric 



12^1 (t) ^u- 2 (t), 



4 For 12— >i(t) > 0, 2 is like a chief sending orders to its subordinate 
1; the fact that 1 will behave according to these orders makes its 
future more predictable from the vewpoint of the present of 2. 



but the symmetrized information flow ([171) is equal to the 
rate of mutual information 

i 2 _n (t) + u_>a (t) = ^ /[Xi (t) : X 2 (*)] . (22) 

While the mutual information is symmetric with re- 
spect to inter-changing its arguments I[Xi(i) : ^2(i)] = 
I[X 2 (t) : Xx(t)] and it quantifies correlations between 
two random variables, the information flow is capable 
of distinguishing the source versus the recipient of in- 
formation: i2_>i(t) > means that 2 is the source of 
information, and 1 is its recipient. 

For i2->i(t) > and ii^ 2 (t) > we have a feedback 
regime, where 1 and 2 are both sources and recipients 
of information. Now the interaction between 1 and 2 
builts up the mutual information I[Xi(t) : X 2 (t)j; see 
(f2"2")l . In contrast, for i2_>i(i) < and ii^ 2 (t) < both 
particles are detached from each other, and the mutual 
information naturally decays. 

For 12—a (t) > and ii^ 2 (t) < we have one-way 
flow of information: 2 is source and 1 is recipient; like- 
wise, for i2_>i(£) < and ii^ 2 (t) > 0, 1 is source and 
2 is recipient. A particular, but important case of this 
situation is when the mutual information is conserved: 
4rI[Xi(t) : X 2 (t)] = 0. This is realized in a stationary 
case; see below. Now due to i 2 ^i(t) + i\^ 2 {t) = the 
information behaves as a conserved resource (e.g., as en- 
ergy): the amount of information lost by 2 is received by 
1, and vice versa. Another example of one-way fow of 
information is 12— >i(t) > and i\^ 2 (t) ~ 0. 

2. The information flow 12— >i can be represented as 

[see ua HZ)] 



12— 1 



dSjX^t)} 
dt 



d T S[X 1 (t + T)\X 2 (t)]\ T ^ , (23) 



The first term in the RHS of (|23|) is the change of the 
marginal entropy of X\, while the second term is the 
change of the conditional entropy of X\ with X 2 being 
frozen to the value X 2 (t). In other words, 12— »i(t) is that 
part of the entropy change of X\ (between t and t + 
r), which exists due to fluctuations of X 2 (t); see section 
IIIBI This way of looking at the information flow is close 
to that suggested in |4Q|; see also fill |42| for related 
works. Appendix IB"1 studies in more detail the operational 
meaning of the freezing operation. 

3. Eq. (f2"Tj) implies the information flow i 2 _>i(t) can be 
divided in two components: a force-driven part i 2 _>i(i) 
and bath-driven (or fluctuation-driven) part if_^ 1 (t) 

l2^l(t)=l|Ll (*)+&*!(*). 

&^(t) = ^/dxP [dxH 12 ] d, ln^, (24) 
I2L1W = ~ J dxA P 2 n [d, lnP 2|1 ] 2 , (25) 

where for simplicity we omitted all integration variables. 
The force-driven part if— nullifies together with the 
force —d\H\ 2 acting from the second particle on the first 



particle. The fluctuation-driven part if_ >1 (f) nullifies to- 
gether with the bath temperature T\ (which means that 
the random force acting from the first bath is zero). 

Note that although is defined via the interac- 

tion Hamiltonian H% 2 , it does not suffer from the known 
ambiguity related to the definition of H\ 2 . That is re- 
defining H 12 (x) via Hu(x) -> H 12 {x) + fi(xi) + f 2 (x 2 ), 
where fi(x\) and f 2 (x 2 ) are arbitrary functions will not 
alter i2_,i(£) (and will not alter i2_>i(i), of course). Thus, 
changes in separate Hamiltonians H\{x\) and #2 (#2) 
that do not alter the probabilities (e.g., sudden changes) 
do not influence if_^ 1 (t). In contrast, sudden changes of 
the interaction Hamiltonian will, in general, contribute 
to &^(t). 

The bath-driven contribution if_ >1 (t) into the infor- 
mation flow is negative, which means that for 2 to be 
a source of information, i.e., for i2_»i(t) > 0, the force- 
driven part 12— >i(t) should be sufficiently positive. In 
short, there is no transfer of information without force. 

4. It is seen from (JT5J| [or from ([21])] that if the 
variables X\ and X 2 are independent, i.e., P(x;i) = 
Pi(xi; i)-P 2 (£ 2 ; i) the information flow i2->i nullifies: 

i2^i(t)=H-2(*) = for P = PiP 2 - (26) 

In fact, both if_^ x (t) and lf^*) nullify for P = P X P 2 . 

If the two particles were interacting at times t < Switch , 
but the interaction Hamiltonian H\ 2 [see (|30) ] is switched 
off at t = tswitch the information flow 12— >i = if— >iW Wlu 
be in general different from zero for times £ ra iax+^switch > 
t > Switch — where i re iax is the relaxation time — since 
at these times the common probability will be still non- 
factorized, P 7^ P\P 2 - However, since now i2_>i(t) = 
i2-»i(t) < the particles can only decorrelate from each 
other: neither of them can be a source of information for 
another. 

We note that (t) = -jP- / &x x P\{x x )T\(x\) in (35) 
is proportional to the average Fisher information T\(x\) 

Ti{x\) = J dx 2 P 2 \i(x 2 \xi) [di lnP 2 |i(a;2|^i)] 2 - 

l/J-i(xi) is the minimal variance that can be reached 
during any unbiased estimation of x± from observing x 2 
[2^ . In the sense of estimation theory T\{xi) quantifies 
the information about x\ contained in x 2 , since T\{x\) is 
larger for those distributions P 2 \i(x 2 \x\) whose support is 
concetrated at x 2 ~ x% 5 . We conclude that once the in- 
teraction between 1 and 2 is switched off, 1 gets detached 
(diffuses away) from 2 by the rate equal to the diffusion 
constant ^ times the average Fisher information about 
x\ contained in x 2 . 



5 For various applications of the Fisher information in the physics 
of Brownian motion see |4S| . 
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FIG. 1: Time-discretized version of the joint dynamical evo- 
lution. 

5. To visualize information flow, we time-discretize 
the two-particle system and consider the two time series 
X\(t),X 2 (t) with t — 1,2,.... We model the system 
dynamics by a first order Markov process, as being the 
discrete analogue of a first order differential equation. It 
is described by the graphical model depicted in Fig. [TJ 

The graph is indeed meant in the sense of a causal 
structure [281 ] , where an arrow indicates direct causal in- 
fluence. In particular, the arrows on Fig. [Tj mean that 
X-y(t + 1) and X 2 (t + 1) become independent after con- 
ditiong over (X±(t), X 2 (t) ); see ([T5[) in this context. As- 
sume, for the moment, that the arrrow between X 2 {t) 
and Xi(t + 1) were missing, e.g., X 2 (t) is simply a pas- 
sive obervation of Xi(t). Then X 2 (t) and X\(t + 1) are 
conditionally independent given X\{t). Now the data 
processing inequality [13] imposes negativity of the (time- 
discretized) information flow 

J[X x (t + 1) : X 2 {t)\ - J[X x (t) : X 2 {t)\ < , (27) 

because X\(t + 1) is obtained from X\{t) by a stochas- 
tic map (which can never increase the information about 
X 2 (t) without having access to this quantity). Expres- 
sion (|27p can, however, be properly below zero if the 
bath dissipates some of the information. If ([2"T[) is posi- 
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tive there must be an arrow from X 2 (t) to Xi(t + 1), i.e., 
there is information flowing from X 2 to X\. However, the 
presence of such an arrow does not guarantee positivity 
of ([77]) because the amount of information dissipated by 
the bath can exceed the one provided by the arrow. We 
can nevertheless interpret I[Xx(t + 1) : X2(t)] — I[X%(t) : 
X2(t)] as the information in the sense of a net effect. 



This on indicates on one-way flow of information. Be- 
low we clarify how this flow relates to the temperature 
difference. 

Hi) Non-equilibrium states can be maintained exter- 
nally by time-varying conservative forces, or, alterna- 
tively, by time-independent non-conservative forces ac- 
companied by cyclic boundary conditions. Here we do 
not consider this type of non-equilibrium. 



THERMODYNAMIC ASPECTS OF 
INFORMATION FLOW 



B. Entropy production and heat dissipation. 



A. Information flow nullifies in equilibrium 

So far we have discussed the formal definition of in- 
formation flow 12— >i- This definition relates 12^1 to the 
concept of mutual information. It is however expected 
that there should be a more physical way of understand- 
ing 12— >i, since the concepts of entropy and entropy flow 
are defined and discussed in thermodynamics of non- 
equilibrium systems 0, \4^, [5(1 [Hj]. We now turn to 
this aspect. 

Recall that the Fokker-Planck equation (0 [9]) describes 
a system interacting with two thermal baths at different 
temperatures T\ and T 2 . If T x = T 2 = T then the two- 
particle system relaxes with time to the Gibbs distribu- 
tion with the common temperature T [46| : 



P eq (x) 



f 



-pH{x) 



Z= / d 



-/8H(x) 



(28) 



An important feature of the equilibrium probability 
distribution (|28|) is that the currents of probability Q 
do not depend on time and explicitly nullify in that state 



Ji(x) = J 2 (x) = 0. 



(29) 



This detailed balance feature — which is both necessary 
and sufficient for equilibrium — indicates that in the equi- 
librium state there is no transfer of any physical quantity, 
e.g., there is no transfer of energy (heat). 

Eq. (UHJ) implies that the same holds for 12— >i: in the 
equilibrium state there is no information flow, 

12-fl = ii->2 = 0. 

In other words, the transfer of information should always 
be connected with a certain non-equilibrium situation. 
One may distinguish several types of such situations: 

i) Non-stationary (transient) states, where the joint 
distribution P(xi,x 2 ;t) is time-dependent. 

ii) Stationary state, but non-equilibrium states real- 
ized for Ti 7^ T 2 . Now the probability currents J\ and J 2 
are not zero (only d\J\ +d 2 J 2 = holds), and so are the 
information transfer rates 12— >i and ii_ ,2. However, since 
the state is stationary, their sum nullifies due to (f22|) 



A non-equilibrium state will show a tendency towards 
equilibrium, or, as expressed by the second law, by the 
entropy production of the overall system (in our case the 
brownian particles plus their thermal baths). Since for 
the model ([51 [HI HJ) the baths arc in equilibrium and the 
cause of non-equilibrium is related to the brownian par- 
ticles, the overall entropy production can be expressed 
via the variables pertaining to the brownian particles 
[H, IH, [H . Recall CEH) and define for i = 1, 2 

^§ = - J dxfT(x) ft Ji(x;t), (31) 
^ = y"dx[lnP(x;i)]^J l (x;i), (32) 



and note that these quantities satisfy 



d^ 
d* 



1 dt 



PiFi / dx 



P(x;i) 



> 0, 



(33) 
(34) 



where t\ 
Ji = J 2 = 







?2 = holds in the equilibrium only, where 
see ([29"]). 

Our discussion in section UlI Al implies that is the 
change of energy of the two-particle system due to the 
dynamics of a^. Since in the overdamped regime this 
dynamics is driven by the bath at temperature Ti, we see 
that is the heat received by the two-particle system 
from the bath at temperature Tj [5Jj,|53( . Thus the energy 
(heat) recieved by the bath is — 54gt. Likewise, %!■ is the 
change of entropy of the two-particle system due to the 
dynamics of Xi. Then (134[) is the local version of the 
Clausius inequality [H(l l53l |. which implies that ii is the 
[local] entropy produced per time due to the interaction 
with the bath at temperature Tj, while 



I - h + ti 



dS 



diQ d 2 Q 

(35) 



12- 



0. 



(30) 



is the total entropy produced per unit of time in the 
overall system: two equilibrium baths plus two brownian 
particles 0, H US [HI • 

Noting our discussion in Appendix [Cl one can see that 
■p^r is equal to the space-average / dxP(x;t)u?(x;i), 
where U,(x; t) is the coarse-grained velocity; see (|C41IC4[) 
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6 . Similarly, integrating (|3Tj) by parts one notes that the 
amounts to the space- averaged work done by the 
potential force — = J dxP(x; i)t;j(x; t)diH{~x) 7 . 

Note that in general li (and thus €) are non-zero for 
the non-equilibrium stationary state realized for T± =/= T2, 
since then Jj(x) 7^ 0. This is consistent with the inter- 
pretation of these states as "metastable" non-equilibrium 
states, where some heat flows between the two baths, but 
the temperatures Tj do not change in time due to the 
macroscopic size of the baths (these temperatures would 
change for very large times, which are, however, beyond 
the time-scales considered here.) 

Likewise, is the heat dissipated per time due to 
the interaction with the bath at temperature 2$, while 
Tjji + T 2 £ 2 is the total dissipated energy (heat) per time 



C. Equivalence between flow of mutual 
information and flow of entropy 



the entropy rate of the first particle itself, we get the 
entropy flow from the second particle to the first one. 

It should be now clear that this definition of entropy 
flow is just equivalent to the definition of information 
flow (fT7|) . Eq. (|36j) can also be written as 



12- 



dt 



dt 



~~dT ' dt 



making clear again that 12— >i is the change of mutual 
information due to dynamics of the first particle. 



VI. EFFICIENCY OF INFORMATION FLOW 

Once it is realized that non-zero information transfer 
is possible only out of equilibrium, the existence of such 
a transfer is related to entropy production. This is an 
entropic cost of the information transfer. One can define 
a dimensionless ratio: 



V2- 



12-.1 



(38) 



The picture that emerges from the above consideration 
is as follows. The entropy is produced with the rate i\ 
somewhere at the interface between the first brownian 
particle and its bath. Eq. (|34p expresses the fact that 
is the rate by which a part of the produced entropy flows 
into the two-particle system. The rest of the produced 
entropy goes to the bath with the rate — This is 

consistent with the conservation of energy and the fact 
that the bath itself is in equilibrium; then — %p is the 
rate by which heat goes to the bath, and after dividing 
by T\ — since the bath is in equilibrium — this becomes 
the rate with which the bath receives entropy. 

The corresponding argumentation can be repeated for 
the second particle and its thermal bath. 

The thermodynamic definition of entropy flow eventu- 
ally reads 



12-1-1 



dSi 
~df 



dig 
"dt~" 



(36) 



Once is the entropy entering into the two-particle 
system via the first particle, then subtracting %r from 



Note from J34} that for T\ = T2 — where the non-stationarity is 
the sole cause of non-equilibrium — the heat dissipation, or the 
total entropy production I times T\ = T2 = T, is equal to the 
negative rate of free energy [Hj]: -Tt = = ^ / dx [if(x) + 
T In P(x; t) ] P(x; t). Note that the free energy here is defined 
with respect to the bath temperature T and that P(x; t) is an 
arbitrary non-equilibrium distribution that relaxes to the equi- 
librium: P oq (x) oc exp [— f3H(x)]. The difference F— P cq between 
the non-equilibrium and equilibrium free energy is the maximal 
work that can be extracted from the non-equilibrium system in 
contact with the thermal bath [2dl | . 

A similar argument to the physical meaning of ^2 can De given 
via stochastic energetics |5al . 



which is the desired output (= information transfer) over 
the irreversibility cost (= total entropy production). This 
quantity characterizes the efficiency of information trans- 
fer. A large r\ is desirable, since it gives larger information 
transfer rate at lesser cost. 

Note that 772^1 is more similar to the coefficient of 
preformance of thermal refrigerators — which is also de- 
fined as the useful output (heat extracted from a colder 
body) to the cost (work) — than to the efficiency of heat 
engines. The latter is defined as the useful output over 
the resource entered into the engine. We shall still call 
772^1 efficiency, but this distinction is to be kept in mind. 

Below in several different situations we shall establish 
upper bounds on 77. These determine the irreversibility 
(entropy) cost of information transfer. The usage of the 
efficiency for informational processes was advocated in 

M. 



A. Stationary case 

In the stationary two-temperature scenario the joint 
probability P{x\,X2) and the probability currents 
J\{xi,X2) and J%{x\,X2) do not depend on time. Thus 
many observables — e.g., the average energy of the two 
Brownian particles, their entropy, entropies of separate 
particles — do not depend on time either. 



Employing 



^-i - "dT - — and 



t 



12->1 



Ta-Ti 



+ £2 



we get 

T2I2 > 0, 
(39) 



After interchanging the indices 1 and 2 we get the anal- 
ogous equation for li—^- Recalling that (|3"0|) holds in the 



9 



stationary state, we get 



T 2 



12->1 = 



(40) 



T2-T1 

First of all, ((39]) implies that i 2 ^i > for T 2 > Ti, which 
means that in the stationary two-temperature scenario 
the information (together with heat) flows from higher 
to lower temperature. 

For the efficiency we get from (j3"9"I HO")) : 



Ti 



T 2 - Ti h + 1 2 

T 2 £1 
T 2 — Ti 

This then implies [since i\ > 0, l 2 > 0] 



Ti 



T2-T1 



< r]2- 



< 



T-2 



(41) 
(42) 

(43) 



For T 2 > T\, r\ 2 ^\ is positive and is bounded from above 
by T ^ T , which means that in the stationary state the 
ratio of the information transfer (rate) over the entropy 
production (rate) is bounded. 

It is important to note from (|4"3"| that for T 2 approach- 
ing Ti from above, T 2 — > Ti, the efficiency of information 
transfer tends to plus infinity, since in this reversible limit 
the entropy production, which should be always positive, 
naturally scales as (Ti — T 2 ) 2 , while the information flow 
scales as T 2 — T\. Thus very slow flow of information can 
be accompanied by very little entropy production such 
that the efficiency becomes very large. A similar argu- 
ment led some authors to conclude that there is no funda- 
mental cost for the information transfer at all [H, EH, [IB] • 
Our analysis makes clear that this interpretation would 
be misleading. It is more appropriate to to say that in the 
reversible limit the thermodynamic cost, while becoming 
less restrictive [since the efficiency can be very large], is 
certainly still there, because the difference T 2 — T\ is, af- 
ter all, always finite; otherwise the very information flow 
would vanish. 

It remains to stress that the relations (l39l l43|) are gen- 
eral and do not depend on the details of the consid- 
ered Brownian system. They will hold for any bi-partite 
Markovian system which satisfies to master equation (|16[) 
and local formulations of the second law 



B. Reachability of the upper efficiency bound 

In the remaining part of this section we show that the 
upper bound (|43[) for the efficiency of information flow is 
reached in a certain class of Brownian systems satisfying 
time-scale separation. 



1. The stationary probability in the adiabatic case. 

For Ti = T 2 , the stationary probability distribu- 
tion P(x) of the two-particle system (JH1 EI) is Gibbsian: 



P(x) cx e~ ,3lH ' x ). For non-equal temperatures T\ 7^ T 2 a 
general expression for this stationary probability can be 
derived in the adiabatic situation, where x 2 changes in 
time much slower than x\ [HD, |H, H3, HH • This is ensured 
by 



£ = r 1 /r 2 <i. 



(44) 



Below we present a heuristic derivation of the stationary 
distribution P(x) in the order £° [52|, HH, H3, HH ; a more 
systematic presentation, as well as higher-order correc- 
tions in e are discussed in (53|. 

On the time scales relevant for x±, the variable x 2 is 
fixed, and the conditional probability P\\ 2 (x\ \ x 2 ) is Gibb- 
sian [j3i = l/Tx]: 

-/3iff(x) r 

Pi\ 2 (xi\x 2 ) = , Z{x 2 ) = J dxie- ftH ( x >.(45) 

The stationary probability ^2(^2) for the slow variable 
is found by noting that on the times relevant for x 2 , 
x\ is already in the conditional steady state. Thus the 
force d 2 H(x\,x 2 ) acting on x 2 can be averaged over 

P\\2{x 2 \xi) 

J dxi ajJT(x)Pi| 2 (a;i|a;2) = d 2 F(x 2 ), 

where F{x 2 ) — —T\h\Z(x 2 ) is the conditional free en- 
ergy. Once the averaged dynamics of x 2 is governed 
by the effective potential energy F{x 2 ), the stationary 
P 2 {x 2 ) is Gibbsian at the inverse temperature (3 2 = 1/T 2 : 



P 2 (x 2 ) 



c\x->e-^ F( * X2 \ 



Thus the joint stationary probability is 



P(x) = P 2 (x 2 )P 1 | 2 (x 1 |x 2 ) = -e 



= _ P (Pi-l32)F{x 2 )-f3 1 H{ x .) 



(46) 



For further calculations we shall need the stationary 
current J 2 (x) , which reads from [201 ErB")) 



J 2 (x) = %-^P(x)0(x) 

J if 2 



(47) 



0i^ 2 (x) = -<9 2 Pi2(x) + J dyP 1 \ 2 (y\x 2 )d 2 H 12 (y,x 2 ), 

(48) 

where 0i^ 2 (x) is the force acting on 2 from 1 minus its 
average over the fast x\. Both Ji(x) and J 2 (x) are of the 
same order O(^) = £>(±). 

Let us describe how to obtain J\ (x) (we shall need be- 
low at least an order of magnitude estimate of J\ ) . Upon 
substituting (|46|) into ([9]) we find that the current Ji(x) 
nullifies. This means that Ji(x) has to be searched for 
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to first order in e = Ti/T 2 . Assuming for the corrected 
probability P(x) = P(x)[l - £-4(x)] + C(e 2 ) we get 



Ji(x) =e^P(x)d 1 A(x), 
i 1 



(49) 



where -4(x) does not depend on e and is to be found 
from the stationarity equation d\J\+d 2 J 2 = 0. Concrete 
expressions for A(n) are presented in [53| . 



2. Entropy production and heat dissipation in the adiabatic 
stationary case. 

Using d34j) together with gT]) and ([49]) we get for the 
partial entropy productions: 



(50) 



T 2 T?T 2 



dx<^ 2 (x)P(x). (51) 



Note that ^ contains an additional small factor e as 
compared to t 2 . This is natural, since the fast system X\ 
is in a local thermal equilibrium. Thus in the considered 
order e° the overall entropy production i is dominated 
by the entropy production of the slow sub-system: 



4 = 0, 



(52) 



Thus the heat dissipation is simply T 2 £ — T 2 £ 2 . The 
physical meaning of £± = is that the fast system does 
not produce entropy [and does not dissipate heat], since 
in its fast characteristic times it sees the slow variable 
as a frozen [not fluctuating] field. Thus the fast system 
remains in local equilibrium. 



3. Efficiency of information transfer in the adiabatic 
stationary case. 



Employing ([52")) we can immediately see from (|4"2"| that 
the efficiency in the considered order of s reads: 



%— 1 = 



T 2 



(53) 



Thus, the efficiency reaches its maximal value when the 
hotter system is the slowest one. 

Recall that in the adiabatic limit the information flow 
is of order 0(1/T 2 ), i.e., it is small on the characteristic 
time scale of the fast particle, but sizable on the charac- 
teristic time of the slow particle. In more detail, using 
(l46l |47|) we get for the information flow 12^1 from the 
slow to the fast particle: 



12-1 



T 2 -T 1 

t?t 2 



dx0 2 ^ 2 (x)P(x). 



(54) 



The reachability of the upper bound for the efficiency 
772 — 1 appears to resemble the reachability of the optimal 



Carnot efficiency for heat engines. However, for heat en- 
gines the Carnot efficiency is normally reached for pro- 
cesses that are much slower than any internal character- 
isitc time of the engine working medium. In contrast, the 
information flow in (|53p is sizable on the time-scale I^ 
(which is one of the internal time-scales). 



VII. EXACTLY SOLVABLE MODEL: TWO 
COUPLED HARMONIC OSCILLATORS 

We examplify the obtained results by the exactly solv- 
able model of two coupled harmonic oscillators. We shall 
also employ this model to check whether the upper bound 
(I43p may hold in the non-stationary situation. 

The Hamiltonian [or potential energy] is given as 



H(x) = —x x + —x 2 + bx x x 2 , 

where the constants a%, a 2 and b have to satisfy 
ai > 0, a 2 > 0, a\a 2 > b 2 , 



(55) 



(56) 



for the Hamiltonian to be positively defined. 

Let us assume that the probability of the two-particle 
system is Gaussian [this holds for the stationary proba- 
bility as seen below] 

— lnP(x) = — - x\ H — ^~ x 2 + Bx\x 2 + constant, (57) 

Here we did not specify the irrelevant normalization con- 
stant, and the constants A±, A 2 and B read 



Ai B 
B A 2 



(x\) (xiX 2 ) \ _ { (Til <Ji 



(X1X2) 



Cl2 022 



(58) 



where (...) is the average over the probability distribu- 
tion (|57|) . All the involved parameters (A\, A 2 and B) 
can be in general time-dependent. Note that in (|57|) we 
assumed (xi) = (x 2 ) = 0. 

For the Hamiltonian ([55]) and the probability (|57|) the 
information transfer reads from ([2T|) : 



12—1 



<T 12 (TiP-&) 



(59) 



TlCTll 

Likewise, we get for the local entropy production (f34)) : 

h = ^[ ( T 11 (/3iai-A 1 ) 2 +a 2 2(/3i6-P) 2 
1 1 

+ 2a 12 (f3 1 a 1 -A 1 )(f3 1 b-B)}, (60) 

where the information transfer ii_> 2 and the entropy pro- 
duction l 2 are obtained from (|59p and (fBT))) . respectively, 
by interchanging the indices 1 and 2. 
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A. Stationary case. 

Provided that the stability conditions ([56]) hold, any 
initial probability of the two-particle system relaxes to 
the time-independent Gaussian stationary probability 
([57]) [46| . The averages (x\), {x\X2) and (xf) are deduced 
from the stationarity equation d\ Ji(x) + d 2 J 2 {*) = 0: 



{4) 



(xix 2 ) 



a 2 Ti(ai7i + 0272) + b 2 ji(T 2 - Tj) 
(aia 2 - 6 2 )(ai7i + a 2 7 2 ) 

air 2 (ai7i + "272) + b 2 j 2 (Ti - T 2 ) 
(aia 2 - 6 2 )(ai7i + a 2 7 2 ) 
&(a 2 7 2 Ti + ai7iT 2 ) 

(aia 2 - 6 2 )(ai7i + a 2 7 2 ) ' 
2 



(x 2 )(a; 2 ) - (^1^2 



where 



ft 2 7i72(Ti - T 2 ) 2 + (oiTi + a2j2) 2 T 1 T 2 
(aia 2 - 6 2 )(ai7i + a 2 7 2 ) 2 



71 = l/fi, 7 2 = i/r 2 , 



(61) 
(62) 
(63) 

(64) 



and where the conditions (xi) — (x 2 ) = hold auto- 
matically. Let us introduce the following dimensionless 
parameters: 



T 2 a 



2«1 



71 ai 



7202 T1CL2 



« T 2 ' 



aia2 



(65) 



where 93 is the ratio of time-scales, while K characterizes 
the interaction strength; note that < n < 1 due to ([56]) . 

Using (|61H65p together with ([59]) and ([60]), we get for 
the information flow, the total entropy production, and 
the efficiency 



a 2 K<p(l -£)(£ + ¥>) 
12^1 = — 



r 2 Kip (i-0 2 + a± + v) 2 ' 
^2 (1 -£) 2 «y 
r 2 



?72^1 = 



1 + lp 

(g + ¥>)(! + ¥>) 



(66) 
(67) 
(68) 



Let us for simplicity assume that £ < 1, i.e., T 2 > 
Ti. Now i 2 ^i > 0. It is seen that both i2_>i and £ are 
monotonically increasing function of tp. Thus both the 
information transfer and the overall entropy production 
maximize when the system attached to the hotter bath 
at the temperature T 2 is slower [the same holds for the 
overall heat dissipation]. The efficiency 772— >i = 12— >i/^ 
is also equal to its maximal value (|53p in the same limit 
tp — ► 00; see ([6"8"]) . 

The behaviour with respect to the dimensionless cou- 
pling k is different. Now 12— >i and t are again mono- 
tonically increasing functions of k. They both maximize 
for k — > 1, which — as seen from (fBTT l6"2l |6"3"]) — means at 
the instability threshold, where the fluctuations are very 
large. However, the efficiency 772-*! is a monotonically 



decreasing function of k. It maximizes (as a function of 
k) in the almost uncoupled limit k — ► 0. Thus, as far 
as the behaviour with respect of the coupling constant is 
concerned, there is a complementarity between maximiz- 
ing the information flow and maximizing its efficiency. 



Non-stationary situation. 



Once the upper bound (|43[) on the efficiency of informa- 
tion transfer is established in the stationary situation, we 
ask whether it might survive also in the non-stationary 
case. Below we reconsider the coupled harmonic oscilla- 
tors and show that, although the upper limit (j43| can be 
exceeded by some non-stationary states, the efficiencies 
772 >i and ?7i_ >2 are still limited from above. 

We assume that a non-equilibrium probability of the 
two harmonic oscillators is Gaussian, as given by (|57[ 
55]) . The information flow and the entropy production are 
given by ([59]) and ([6"0]) , respectively. Any Gaussian prob- 
ability can play the role of some non-stationary state, 
which is chosen, e.g., as an initial condition. 

To search for an upper bound of the efficiency in the 
non-stationary situation, we assume a Gaussian probabil- 
ity ([ST]) and maximize the efficiency (|3"5)) over the Hamil- 
tonian, i.e., over the parameters a\, a 2 and b in (|55|) . The 
stability conditions (|56J) , which are necessary for the exis- 
tence of the stationary state, are not necessary for study- 
ing non-stationary situations (imagine a non-stationary 
down-hill moving oscillator). Thus conditions (|56[) will 
not be imposed during the maximization over a\ , a 2 and 
b. 

Note that 12— >i in (|59[) does not depend on ai and a 2 . 
Since we are interested in maximizing the efficiency (|38p , 
the first step in this maximization is to minimize the 
local entropy production i\ [given by ([60p ] over a\. This 
produces 



a 1 



Ti - bai 



t 1 A 2 



The second entropy production £ 2 is analogously minized 
over a 2 . The resulting expression for the efficiency reads: 



TxB - b 



Q\i 

Pi (TiB-by (T 2 B-bY 

ri a 2 r 2 a 1 



(70) 



Next, we maximize this expression over b. Let us for 
simplicity assume 012 > 0, since the conclusions do not 
change for eri 2 < 0. The maximal value of 772— >i reads 



2T--T, 



[sign(T 2 -Ti) + yrTx], (71) 



where 



T 2 T 2 A\ _ T 2 T 2 a 22 
TiTiA 2 TiLktii 



(72) 
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and where the RHS of (|7Tj) is reached for 



012 



iTa-Td 



(73) 



The information flow 12— >i at the optimal value of b reads 



'12 |ri 



T 2 



&ii V 1 + x 



(74) 



Note that the maximal value of 772— >x does not require 
a strong inter-particle coupling. This coupling, as quan- 
tified by (|73[) can be small due to CT12 — > 0. 

Now we assume that the temperatures Ti and T2 are 
fixed. It is seen from (fTTj) that taking x sufficiently 
large — either due to a large which means that the 
second particle is slow, or due to a large 221 , which means 
that its dispersion is larger — we can achieve efficiencies 
as large as desired. In all these cases, for a sufficiently 
large x, 772— >i scales as */x~. Thus one can overcome the 
stationary bound (14311 via some special non-stationary 
states and the corresponding Hamiltonians. We, how- 
ever, see from (|74|) that increasing the efficiency due to 
X — > 00 leads to decreasing the information flow. This is 
the same trend as in the stationary case; see (|43|) . 

The large values of 772— >i do not imply anything special 
for the efficiency of the inverse information transfer effi- 
ciency 771^2 • Indeed, the partially optimized 771^2 reads 
analogously to (JT01 



012 



T 2 B-b 



(75) 



At the values (|69l [73)1 . where 772^1 extremizes, 771^2 as- 
sumes a simple form 



7?1^2 



2(Ti - T 2 



i.e., 771^2 depends only on the temperatures and can have 
either sign, depending on the sign of Ti — T2. Note that 
771^2 follows the same logics as in the stationary state: 
it is positive for Ti > T2. 



VIII. COMPLEMENTARITY BETWEEN HEAT- 
AND INFORMATION-FLOW 

Contrary to the notion of information flow, the heat 
flow from one brownian particle to another calls for an 
additional discussion. The main reason for this is that 
the brownian particles could, in general, be non-weakly 
coupled to each other, and then the local energy of a 
single particle is not well defined; for various opinions 
on this point see [H, I n contrast, the entropy of a 
single particle is always well-defined; recall in this context 
feature 3 in section IIV Bl Nevertheless, the notion of a 
separate energy can be applied once there are physical 
reasons for selecting a particular form of the interaction 



Hamiltonian #12(211, x 2 ) in (|20"|) . and the average value of 
H 12(^1, X2) is conserved in time, at least approximately. 
(A particular case of this is when Hi2(xi, X2) is small.) 
Now Hi(x\) can be defined as the local energy of the first 
particle, while the average energy change is 



dt 



dgi 
~dt~' 



The energy flow e2->i from the second particle to the 
first one is defined in full analogy with (|3~7| 



2 2— >1 = 



~~dT 



diQ 
dt ' 



(76) 



where 4^ j s defined in (|3 1 1) . In words (|7rj|) means: the 
energy change of the first particle is equal to the 
energy e2^i received from the second particle plus the 
energy diQ put into the system via coupling of the first 
particle to its thermal bath. 
Working out (|T6"|) we obtain 



■ f dxJitetfaHnix). (77) 



^2^1 



The interpretation of (|77|) is straightforward via con- 
cepts introduced in Appendix [C] where we argue that 
vx (x; t) — p^.'*^ can be regarded as a coarse-grained ve- 
locity of the particle 1. The ([77|) becomes the average 
work done on the particle 1 by the force <9i-ffi2(x) gen- 
erated the particle 2. 

The symmetrized heat flow is equal to the minus 
change of the interaction energy 



d f 

e 2 ^i + ei^ 2 = / dxHx 2 (x)P(x;t). 



(78) 



Thus the there is a mismatch between the heat flowing 
from 1 to 2, as compared to the energy flowing from 2 
to 1. This mismatch is driven by the change of inter- 
action energy, and it is small provided that the average 
interaction energy is [approximately] conserved in time. 

It is important to stress that the ambiguity in the defi- 
nition of heat flow, which was related to the choice of the 
local energy Ex, is absent in the stationary state, since 
the the average interaction energy is conserved by con- 
struction. Now the choice of the single-particle energy 
is irrelevant, since any such definition will lead to time- 
independent quantity, which would then disappear from 
U7BJ), ^ = 0, and from the RHS of GHJ). 

As clear from its physical meaning, the efficiency C2— >i 
of heat flow is to defined as the ratio of the heat flow over 
the total heat dissipation 

?2^1 



C2 



(79) 



Txh + T 2 £ 2 ' 

Let us work out C2— >i for the two-temperature station 
ary case. Analogously to (|4T1 [42]) we obtain 

Ti Txh 



2— >1 — 



T 2 - Ti Txtx + T2I2 

T 2 T 2 l 2 

T 2 - Ti Txlx + T2I2 ' 



(80) 
(81) 
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Eq. (|80| shows that, as expected, the heat flows from 
higher to lower temperatures. In addition, ([BHIISTj) imply 
the same bounds as in (|43|) : 



IX. TRANSFER ENTROPY 



A. Definition 



Ti , T 2 

< (2^1 < 



T 2 -T 1 



T 2 -T x 



(82) 



Moreover, we note that the stationary efficiencies of in- 
formation flow and heat flow are related as 



C2^1?72^1 



TiT 2 



(T 2 - 71)2 



(83) 



Note that the efficiencies e 2 ->i and 772^1 in ([83]) depend 
in general on the inter-particle and intra-particle poten- 
tials, temperatures, damping constants, etc. However, 
their product in the stationary state is universal, i.e., it 
depends only on the temperatures. 

Eq. (|83p implies the following complementarity: for 
fixed temperatures the set-up most efficient for the infor- 
mation transfer is the least efficient for the heat transfer 
and vice versa. Recalling our discussion in section TVI Bl 
we see that the upper bound T ^ T for the efficiency £2^1 
is achieved for the adiabatic stationary state, where — on 
the contrary to the information transfer — the hotter sys- 
tem is faster, i.e., l 2 — > 0, but l\ is finite. 

Note from ([4"TI |4"8"1) the following relation between be- 
tween the heat flow ([77)) and the information flow ([54"|) : 

T\\ 2 ^\ = 6 2->-l; 

which is valid in the adiabatic stationary state. Recall 
that Ti here is the temperature of the bath interactng 
with the fast particle. 

Let us illustrate the obtained results with the exactly 
solvable situation of two coupled harmonic oscillators; see 
section EED Recalling (|55H55|) and (TTTJ we obtain 



e2^i 



a 2 T 2 1 

— K(f 



(84) 



where we employed the dimensionless variables (|55|) . 
Likewise, we get for the heat dissipation and the efficiency 
of the heat transfer [see ([6"6"1 - |6"8")) for similar formulas] 



Til 



Toto 



£(1 



q 2 T 2 KV? (1-Q 2 (C + ^) 

r 2 £(i + ^) 2 + ^(i-0 5 
^) 2 + K ^(i-0 2 



(85) 
(86) 



It is seen that the efficiency C, 2 ^\ maximizes at the in- 
stability threshold k — > 1, in contrast to the efficiency 
f]2— ti of information transfer that maximizes at the weak- 
est interaction; see section \VU\ As we already discussed 
above, as a function of the time-scale <p, £2^1 maximizes 
for if — > 0, again in contrast to the behaviour of r\%— 1 \. 



We consider the task of predicting the future of X\ 
from its own past (with or without the help of the present 
X 2 ). Quantifying this task leads to a concept, which 
has been termed directed transinformation [351 ] or trans- 
fer entropy 36]. In the following we will use the latter 
name, as this seems to be more broadly accepted. Closely 
related ideas were expressed in [37[ . The notion of trans- 
fer entropy became recently popular among researchers 
working in various inter-disciplinary fields; see [38| for a 
short review. 

Our discussion of transfer entropy aims at two pur- 
poses. First, the information flow and transfer entropy 
are two different notions, and their specific differences 
should be clearly understood, so as to avoid any confu- 
sion [3^ ]. Nevertheless, in one particular, but important 
case, we found interesting relations between these two 
notions. 

To introduce the idea of transfer entropy let us for 
the moment assume that the random quantities X\(t) 
and X 2 (t) (whose realizations are respectively the co- 
ordinates X\ and x 2 of the brownian particles) assume 
discrete values and change at discrete instances of time: 

t, t + r, t + 2t, Recalling our discussion in section [TT1 

we see that the conditional entropy S[X\(t + r)\Xi(t)] is 
the entropy reduction (residual uncertainty) of X\ (t + r) 
due to knowing X^t). Likewise, S[X 1 (t+T)\X 1 (t),X 2 (t)] 
characterizes the uncertainty of X\(t + t) given both 
Xi(t) and X 2 (t). The difference m 2 ^i is the transfer 
entropy: 

m 2 ^i 



= - (S[Xt(t + T)|Xx(i)] - S[X x (t + r)|-Xi(t), X 2 (t)] ) 

T 

= -(I[X 1 (t + T):X 1 (t),X 2 (t)]-I[X 1 {t + T):X 1 (t)}) 

T 

= -Y] P(yi,y2;t)p(x 1 ;t + T\y 1 ,y 2 ;t) x 

ln P(xi;t + T\ yi y 2 ;Q 
pi(xi;t + T\yi;t) 

m 2 ^i measures the difference between predicting the fu- 
ture of X 2 from the present for both X\ and X 2 (quan- 
tified by I[Xi(t + T)\Xi(t),X 2 (t)]) and predicting the 
future of X 2 from its own present only (quantified by 
I[Xi(t + T)\Xi(t)]). Note that m 2 ^i is always positive, 
since additional conditiong decreases the entropy. m 2 ^i 
is also equal to the mutual information I[X\(t + r) : 
X 2 (t)\Xi(t)] shared between the present state of X 2 and 
the future state of X\ conditioned upon the present state 
of X x . 

To explain the transfer entropy versus information 
flow, we consider again the discretized version in Fig. [1] 
(where for simplicity we take r = 1). By standard prop- 
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erties of mutual information [271 ] . we have 

I[X 1 (t+l):X 2 (t))<I[X l (t+l),X 1 (t):X 2 (t)} (88) 
- I[Xi{t) : X 2 (t)} + I[Xx(t + 1) : X 2 (t)\X 1 (t)} , (89) 

where the inequality in (l88|) is related to the strong sub- 
additivity feature, and where the equality in (|89p is the 
chain rule for the mutual information [27] . Hence, 

la-i = J[X x (t + 1) : X 2 {t)\ - J[Xi(t) : X 2 (*)] 

</[Xi(t + l) :X 2 (t)|Xi(t)]=m2-i. 

The LHS (left hand side) is the discrete version of infor- 
mation flow, the RHS the transfer entropy. If the arrow 
from X 2 (t) to Xi (t + 1) was absent, the modified graph 
would impose [28J that X\(t + 1) and X 2 (t) are condi- 
tionally independent, given X\(t), i.e., 



m 2 - 



I[X 1 {t+l):X 2 (t)\X 1 {t)]=0. 



This shows that the transfer entropy vanishes in this case, 
as it should be because there is no arrow transmitting 
information. Thus, m 2^1 characterizes the strength of 
that arrow. 

For continiuous time [but still discrete variables X\ 
and X 2 ] we take r — > in (|57|) producing 



m 2^i=y2 , p(yi,V2;t)g 1 (x 1 \y 1 ;y 2 ) x 
z — '2/2,zi#yi 



In 



gi(xi\yi;y 2 ) 



T,z 2 9i(xx\yi;z 2 )P 2 i 1 (z 2 \y 1 ;t) 



(90) 



where gi(xi\yi;y 2 ) is defined analogously to G\ in (fT2"| . 
but with discrete random variables. 

Eq. (f9"0"]) cannot be translated to the continuous vari- 
able situation simply by interchanging the probabilities 
p with the probability densities P, since attempting such 
a translation leads to singularities. The proper extension 
of (|87[) to continuous variables and continuous time reads 



m 2 - 



>i=lim r ^ J dyP(y;t) J V[d Xl <+ T |y; t] 



hi 



V[d Xl * +r |y;^] 
V[d Xl i +T \ yi ;ty 



(91) 



where V[dxi j +T |y;i] (P[dxi ( +r |yi;f]) is the measure of 
all paths xi(t) starting from y = (yi,y 2 ) (from y{) at 
time t and ending somewhere at time t + r, i.e., not 
the final point of the path is fixed, but rather the initial 
time and final times 8 . This extension naturally follows 
gen eral ideas of information theory in continuous spaces 

t+T| 



[261 ] . For our situation the measures P[dxi t |y;i] and 



Integrating P[dxi j +T |y; t] over all paths starting from y at time 
t and ending at x\ in time t+r we get the conditional probability 
density P(xi,t + r\y,t) 



V[dxi \ +T 1 2/1 ; f] refer to the stochastic process described 
by I®©; see [6l| for an introduction to such measures. 

Eq. ([9Tj) is worked out in Appendix [D] producing 

m 2 ^i - J dxP(x;i)^i(x;t), (92) 

2 ^i = diH 12 {-x) - J dy 2 diHi 2 (x 1 ,y 2 )P 2 \i(y 2 \xi;t), 

where 4> 2 ^\ is the force acting from 2 to 1 minus its 
conditional average; compare with 



B. 



Information flow versus entropy transfer: 
General differences. 



Let us compare features of the entropy transfer m 2 _>i 
to those of information flow i 2 _>i. We remind that dif- 
ference between the information flow i 2 _>i and transfer 
entropy m 2 ^i stem from the fact that m 2 ^i refers to the 
prediction of the future of X 2 from its own past (with or 
without the help of the present of X\), while i 2 ^i refers 
to the prediction gain (or loss) of the future of X 2 from 
the present of X\. Thus for m 2 ^i the active agent is 1 
predicting its own future, while for i 2 ^i the active agent 
is 2 prediciting the future of 1. 

i) Both m 2 _>i and i 2 _+i are invariant with respect to 
redefining the interaction Hamiltonian; see (|23|) and com- 
pare with information flow i 2 _ >i. 

ii) In contrast to the information flow i 2 ^i , the entropy 
transfer m 2 ^i is always non-negative. 

in) In contrast to 12— >i, m 2 ^i does not nullify for fac- 
torized probabilities -P(x) = Pi(a;i)P 2 (a; 2 ), provided that 
there is a non-trivial interaction H\ 2 . This because m 2 -»i 
is defined with respect to the transition probabilities; see 
(EZD- 

iv ) m 2 —>\ nullifies whenever there is no force acting 
from one particle to another. Recall that the force-driven 
part lf-^i of the information flow also nullifies together 
with the force, albeit if_>i nullifies also for factorized 
probabilities; see f2"5|). 

v) In contrast to i 2 _ > i, rn 2 ^i does not nullify at equi- 
librium. Thus, 2 can help 1 in predicting its future at 
absolutely no thermodynamic cost. However, as we have 
shown, there is a definite thermodynamic cost for 2 want- 
ing to predict the future of 1 better than it predicts the 
present of 1. 

vi) m 2 _^i is not a flow, since it does not add up ad- 
ditively to time-derivative of any global quantity. How- 
ever, obviously m 2 ^i does refer to some type of infor- 
mation processing. In fact, m 2 ^i underlies the notion of 
Granger-causality, which was first proposed in the con- 
text of econometrics [H, H3] (see [43[ for a review): the 
ratio quantifies the strength of causal influences 
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from 2 to 1 relative to those from 1 to 2 9 . The notion of 
Granger-causality is useful as witnessed by its successful 
empirical applications [SI \M, M, S3 • For > 1 

we shall tell that 2 is Granger-driving 1. 



C. Information flow versus transfer entropy in the 
adiabatic stationary limit 

Given the differences between the information flow and 
the entropy transfer it is curious to note that in the adi- 
abatic stationary situation (see section fVI B ip there ex- 
ist a direct relation between them. We recall that the 
adiabatic situation is special, since the eficiency of infor- 
mation flow reaches its maximal value there. Recall that 
this situation is defined (besides the long-time limit) by 
condition (|44[) . which means that 2 is slow, while 1 is fast; 
at equilibrium, when T\ = T 2 , this slow versus fast sepa- 
ration becomes irrelevant. Reminding also the definition 
(|2"S")) of the force-driven part of the information flow, 
we see that 



m M \ r i/ 



ri 



l = »1. (93) 



The first relation in (f9"3"| indicates that the slow system 
is Granger-driving the fast one, while the second relation 
implies that the same qualitative conclusion is got from 
looking at if_ >i- This point is strengthened by noting 
from (|4"6l |9"2"]) that in in the adiabatic, stationary, two- 
temperature situation we have 



1 



m 2 ^i 



(94) 



A less straighforward relation holds for the action of the 
fast system on the slow one 



mi- 



T 2 2 H 



(95) 



Recall that for the considered adiabatic stationary state, 
it is the action of the fast on the slow that determines 
the magnitude of the information flow 11^2 (the sign of 
ii_+2 is fixed by the temperature difference): 



11^2 — — 12- 



Ti 



1^2- 



(96) 



It is seen that provided T 2 > T\ (i.e., the slow system 
is attached to the hotter bath, a situation realized for 
the optimal information transfer) the Granger-driving 



The usage of ratio jjj is obligatory, since tri2^i does not char- 
acterize the absolute strength of the influence of 2 on 1. Note, 
e.g., that tri2^i nullifies not only for independent process X\(t) 
and X2(t), but also for identical (strongly coupled) processes 
Xi(t) = X2(t); see l87t . This point is made in [3S| . 



qualitatively coincides with the causality intuition im- 
plied by the sign of information flow: both 12^1 > and 
3> 1 hold, which means that 2 predicts better the 
1 > 0) and that 2 is more relevant for 



m 1 

future of 1 (12 



helping 1 to predict its own future {- 



»1). 



It is tempting to suggest that only when the causality 
intuition deduced from i 2 _ ,1 agrees with that deduced 
from ni2— >ij we are closer to gain a real understanding of 
causality (still without doing actual interventions). In- 
terestingly, the present slow-fast two-temperature adia- 
batic system was considered recently from the viewpoint 
of other non-interventional causality detection methods 
reaching a similar conclusion: unambiguous causality can 
be detected in this system, if the slow variable is attached 
to the hot thermal bath [621 ]. 

For the harmonic-oscillator example treated in section 
IVIII we obtain 



m 2^1 = 



bB 



2A 2 T 1 T 1 



A 2 T 1 



(97) 



Employing formulas (fSTl |58|) and 

ary state we note the following relation 



for the station- 



T 2 mi_>2 
1 «(£-!) 



tp + 1 



(98) 
(99) 



where the dimensionless parameters K, £ and tp are de- 
fined in (|65|) . In the adiabatic situation |98|) is naturally 
consistent with (j95l l96|) , but for the considered harmonic 
ocillators it is valid more generally, i.e., for an arbitraty 
stationary state. Eq. (|99|) explicitly demonstrates the 
conflict in Granger-driving between making the oscillator 
2 slow (i.e., ip — ► 00) and making it cold (i.e., £ — > 00): 
for ip — > 00, r " 2 ~* 1 tends to infinity, while for £ — > 00, 



"J 2 " 1 tends to zero. 
Note finally that 



m 2- 



Kip + £(1 + if) 2 : 



which means that apart from the adiabatic limit (<p — > 
or ip — * 00) there is no straightforward relation between 
i2-,i and tli2->i- 

X. SUMMARY 

We have investigated the task of information transfer 
implemented on a special bi-partite physical system (pair 
of Brownian particles, each coupled to a bath). Our main 
conclusions are as follows: 

0. The information flow 12— >i from one Brownian par- 
ticle to another is defined via the time-shifted mutual 
information. For the considered class of systems this def- 
inition coincides with the entropy flow, as defined in sta- 
tistical thermodynamics. 
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1. The information flow 12— >i is a sum of two terms: 
i2_>i = + if-fi) where the bath driven contribution 
i2_»i < is the minus Fisher information, and where 
the force-driven contribution has to be positive and 
large enough for the particle 2 to be an information 
source for the particle 1. 

2. No information flow from one particle to another is 
possible in equilibrium. This fact is recognized in litera- 
ture [l7|, though by itsels it does not yet point out to a 
definite thermodynamic cost for information transfer. 

3. For a stationary non-equilibrium state created by 
a finite difference between two temperatures T± < T2, 
the ratio of the information flow to the total entropy 
production — i.e., the efficiency of information flow — is 
limited from above by T ^2 Tl ■ This bound for the effi- 
ciency defines the minimal thermodynamic cost of infor- 
mation flow for the studied setup. Note that not the total 
amount of transferred information, but rather its rate is 
limited. Thus the thermodynamic cost accounts also for 
the time during which the information is transferred. 

4. The upper bound T ^ T is reachable in the adia- 
batic limit, where the sub-systems have widely different 
characteristic times. The information flow is then small 
on the time-scale of the fast motion, but sizable on the 
time-scale of the slow motion. 

5. The information transfer between two sub-systems 
(Brownian particles) naturally nullifies, if these system 
are not interacting, and were not interacting in the 
past. It is thus relevant to study how the efficiency 
and information flow depend on the inter-particle cou- 
pling strength. As functions of the inter-particle coupling 
strength, the efficiency and information flow demonstrate 
the following complementarity. The information flow 
is maximized at the instability threshold of the system 
(which is reached at the strongest coupling compatible 
with stability). On the contrary, the efficiency is maxi- 
mized for the weakest coupling. 

6. There are special two-temperature, but non- 
stationary scenarios, where the efficiency of information 
flow is much larger than T , but it is still limited by 
the basic parameters of the system (the ratio of the time- 
scales and the ratio of temperatures). 

7. Analogous consideration can be applied to the en- 
ergy (heat) flow from one sub-system to another. The 
efficiency of the heat flow — which is defined as the heat 
flow over the total amount of the heat dissipated in the 
overall system — is limited from above by the same factor 
T ^1 T (assuming that T± < T 2 ). However, in the station- 
ary state there is a complementarity between heat flow 
and information flow: the setup which is most efficient 
for the information transfer is the least efficient for the 
heat transfer and vice versa. 

8. There are definite relations between the informa- 
tion flow and the transfer entropy introduced in [35|, [3(| . 
The transfer entropy is not a flow of information, though 
it quantifies some type of information processing in the 
system, a processing that occurs without any thermody- 
namic cost. 
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APPENDIX A: INFORMATION-THEORETIC 
MEANING OF ENTROPY AND MUTUAL 
INFORMATION 

1. Entropy 

Let us recall the information-theoretic meaning of en- 
tropy (H|), i.e., in which specific operational sense S[X] 
quantifies the amount of information contained in the 
random variable X. To this end imagine that X is com- 
position of N random variables {X(l), . . . ,X(N)}, i.e., 
A is a random process. We assume that this process is 
ergodic [25[ . The simplest example of such a process is 
the case when X(l), . . ., X(N) are all independent and 
identical, 

p(x(l),...,x(N))=]J N k=i p(x(k)), (Al) 

where x(k) = 1, . . . , n parametrize realizations of X{k). 
Note that for an ergodic process the entropy in the limit 
N > 1 scales as cx N, e.g., S[X] = -NYJl =1 p(k) lnp(fc) 
for the above example (|A1[) . 

For N 3> 1, the set of n N realizations of the erg odic 
process X can be divided into two subsets [25], |26|. \27\. 
The first subset 17(A) is called typical, since this is the 
minimal subset with the probability converging to 1 for 
N 3> 1 [2^, [2^, [2?! ; the convergence is normally expo- 
nential over N. The number of elements in 17(A) grows 
asymptotically as e s ^ for JV>1. These elements have 
(nearly) equal probabilities e~ S W. Thus the number 
of elements in Vt{X) is generally much smaller than the 
overall number of realizations e N ln ™ . 

The overall probability of those realizations which do 
not fall into 17(A), scales as e~ const N , and is neglegible in 
the thermodynamical limit N ^s> 1 . All these features are 
direct consequences of the law of large numbers, which 
holds for ergodic processes at least in its weak form (2f| 
HI, ■ Since in the limit N ^> 1 the realizations of the 
original random variable X can be in a sense substituted 
by the typical set 17(A), the number of elements in 17(A) 
characterizes the information content of X [2^, [26|, [23] ■ 

2. Mutual information 

While the entropy S[X] reflects the information con- 
tent of the (noiseless) probabilistic information source, 
the mutual information I[Y : X] characterizes the max- 
imal information, which can be shared through a noisy 
channel, where X and Y correspond to the input and 
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output of the channel [25|, \2Q, |2jj , respectively. To un- 
derstand the qualitative content of this relation consider 
an ergodic process XY = {X(1)Y(1), . . . , X(N)Y(N)}, 
where x(l) = 1, . . . , n and y{l) = 1, . . . , n are the realiza- 
tions of X(l) and Y(l), respectively. 

One now looks at X (Y) as the input (output) of a 
noisy channel (2f| [26|, [27j • In the limit N — > 00 we can 
study the typical sets f2(.) instead of the full set of real- 
izations for the random variables. It appears for N — > 00 
that the typical sets fl(X) and fl(Y) can be represented 
as union of M = e^ X:Y ^ non-overlapping subsets (26j : 



,M 



,M 



UIV1 . ,1V1 

u a (X), n(Y)=\J tu a (Y), (A2) 

such that for TV > 1 



p[y e Lo a {Y) I x S u}p(X)] = 5 at 3e 
p[x e uJ a (X) I y e wp(Y)] = 6 a pe 



-S(Y\X) 

-S(x|y) 



(A3) 
(A4) 



Note that the number of elements in ui a (X) (u> a (Y)) is 
asymptotically e s[x|y l (e 5 l y|x J). Eqs. (|A"3l IA"4]) mean 
that the realizations from uj a {Y) correlate only with 
those from uj a (X), and that all realizations within uj a (X) 
and within Lo a {Y) are equivalent in the sense of (|A31IA4[) . 
It should be clear that once the elements of oj a (X) are 
completely mixed during the mapping to uj a (Y), the only 
reliable way of sending information through this noisy 
channel is to relate the reliably shared words to the sets 
uj a (2(|. Since there are e^ x '^ such sets, the number of 
reliably shared words is limited by e^ XxY ' . Note that the 
number of elements in the typical set Q,n{X) is equal to 
e s ( x \ which in general is much larger than e l ^- x ' Y ^ . 



APPENDIX B: OPERATIONAL DEFINITION OF 
INFORMATION FLOW 

Here we demonstrate that the definition (JTTJ) of the in- 
formation flow is recovered from an operational approach 
proposed in [|(|; see [HI, H3| for related works. 

Let us imagine that at some time i we suddenly in- 
crease the damping constant T2 to some very large value. 
As seen from this will freeze the dynamics of 

the second particle, so that for t > i the joint proba- 
bility distribution P(x; t) satisfies the following modified 
Fokker-Planck equation 

5 t P(x;i) + 5iJi(x;i) =0, (Bl) 
-riJi(x;t) = P(x;t)a 1 ff(x)+T 1 a 1J P(x;t), (B2) 



together with the boundary condition 
P(x;f) =P(x;i), 



(B3) 



where P(x; t) satisfies the Fokker-Planck equation ([HUH]). 
Other methods of freezing the dynamics of X 2 (t) (e.g., 
switching on a strong confining potential H{x2) acting 
on X2) would work for the present purposes equally well. 



The fact of freezing should be apparent from (|B1[) whose 
solution can be represented [using also (|B3|) ] as 



P(x;i) =P ll2 (x 1 \x2]t)P 2 (x2;t), t>t. 



(B4) 



Note that once X2 is frozen, it becomes a random ex- 
ternal field from the viewpoint of the dynamics of X\ (t) . 
The entropy of a system in a random field is standardly 
calculated via averaging (over the field distribution) the 
entropy calculated at a fixed field: 



Si\2 = 



&x.P(x2;t)P l \2(x 1 \x 2 ;t) hxPii 2 (xi\x 2 ;t). 



We select t — > t + and note that the freezing does 
influence directly the marginal entropy rate of the first 



particle 




dSi 


d 


"dT 


= di 




d 




di 



J dx 1 P 1 (x 1 ;t) InPi(a;i;*) 
dxiPi(a;i; t) lnPi(a;i; t) 



a fact that follows from (jBil IB31 IB4|) . 

Now we subtract from the entropy rate of the first par- 
ticle the rate of the conditional entropy Si\ 2 - 



12-1 (*) 



dSi 



dSn 



dt 



(B5) 



where the conditioning t = t is done after taking 4?. 
Thus, i2_>i(t) is that part of the entropy change of X x 
(between t and t + r) , which exists due to fluctuations of 
X 2 (t); see section iBBl 

Employing (|B4[) we note for the last part in (|B5[) 



d% 



dt 



~dt 
d 

dt 



- / dxP(x;t) In P(x;t) 

which means that i2_>i(t) can be defined equivalently via 
the total entropy rate of the overall system, with the 
second particle being frozen. Now using 

dSi 
dS 



dt 



J dxpnPifojt)] diM^t), 
= J dx [In P(x;t)] dxJ^t), 



we get back from ()B5|) to (p~9|) confirming that both def- 
initions are equivalent. 



APPENDIX C: COARSE-GRAINED VELOCITIES 
FOR BROWNIAN PARTICLES 

Consider an ensemble of all realizations of the two- 
particle Brownian system which at time t have a coor- 
dinate vector x. For this ensemble the average coarse- 
grained velocity for the particle with index j might 
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naively be denned as: 



« J -(x J *)=lim e _ t0 fdy^-^-P(y,t + e\x,t). (CI) 



However, it was pointed out by Nelson |57j that the ab- 
sence of regular trajectories enforces one to define differ- 
ent velocities for different directions of time: 

w +)i (x,t)=lini e _ + o / dy i 2^^P(y i) * + e|x,i),(C2) 



u_ j (x, t) = lim e ^ +0 J dy. 



e 

Xj — yj 



P( % ,i-e|x,t).(C3) 



The physical meaning of these expressions is as follows: 
w+j (x, t) is the average velocity to move anywhere start- 
ing from (x, t), whereas u_j(x, t) is the average velocity 
to come from anywhere and to arrive at x at the mo- 
ment t. Since these velocities are defined already in the 
overdamped limit, e is assumed to be much larger than 
the characteristic relaxation time of the (real) momen- 
tum which is small in the overdamped limit. There- 
fore, we call (|C2| IC3|) coarse-grained velocities. It is 
known that for the overdamped brownian motion almost 
all trajectories are not smooth. This is connected to 
the chaotic influences of the bath(s) which randomize 
the real momenta on much smaller times, and this is 
also the reason for u +J (x, t) ^ u_j(x,t). The differ- 
ence W-|_j(x, t) — w_.j(x, t) thus characterizes the degree 
of the above non-smoothness. One now can show that 

[Eil, HI 



APPENDIX D: CALCULATION OF TRANSFER 
ENTROPY 

Here we calculate the entropy transfer as defined in 
([91]) . The measures entering this equation read for a 
small r 

V[d Xl \ +T \ yi , y 2 ; t] = AC eWT /' +T Mr^+^fe 

(Dl) 

P[dxi^ +T |yi;t] = X +T da[Tl±1+hl{xi{a))] \ (D2) 

where /C is the normalization constant, and where 



h\{xi) = I Ay2diH(x ll y 2 )P2\i{y2\xi). 



(D3) 



Both time integrals f. T in (|Dll ID2[) are to be inter- 
preted in the Ito sense [471 ] . Due to this the normalization 
constants for both path-integrals are identical. 

To understand the origin of (jDl[ ID2[) recall from ([111 
[TJ| that in the small-r limit: 

P(x 1 ;t + T\y 1 ,y 2 ;t) = S(xi - yi) 

+ di [8{yi - x 1 )d 1 H(x 1 ,y 2 ) + T^yi - x x )\ , (D4) 
l i 

P{x\\t-\-r\yi;t) = 8(xi - yi) 

+ di [S(yi - xi)hi(xi) + TtdiSfa - an)] .(D5) 
l i 



«+j(x,t) 



-djH(x), 

-[o»jfl'(x)+2T J -a 7 -lnP(x,t); 



We now see that the probability times the average 
coarse-grained velocity h [v+j (x, t) + u_j (x, t)\ = vj (x, t) 
amounts to the probability current of the Fokker-Planck 
equation JH1 EJ) : 

Uj(x,t)P(x;i) = Jj(x;f). 



If one would take e in (jC2[ IC3j) much smaller than the 
characteristic relaxation time of the momentum — which 
would amount to applying definitions (|C2[) and (IC3[) to 
a smoother trajectory — then u+ 3 -(x, i) and t>_j(x, t) 
would be equal to each other and equal to the average 
momentum; see [58l for more details. 



For a small r we get 



1 P[do^ +T |y;*] _ h\{yi)-[d 1 H{y u y 2 ) 



+ 



rV[dxit +T \yi;t] 27^ 
hijyi) - diH(y 1 ,y 2 ) Xi(t + t) - xi(t) 
Ti T 



(D6) 



The fact that the time-integrals were taken in the Ito 
sense is visible in the last term of (|D6I) . Putting (|D6|) 
into (|9~Tj) and noting [for a small r] 



J V[dx! <+ T |y;i] 
we end up at (|52"|). 



xift + r) - .Ti(t) _ J_ 

r ~ ~Ti" 



9iif(y), 
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